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Cohen  ME:  Simulation  study  of  methods  to  detect  periodontal  associations  when 
they  are  inconsistent  among  subjects.  Community  Dent  Oral  Epidemiol  1993;  21: 


19-23. 


Abstract  -  Most  statistical  methods  used  to  evaluate  associations  between  indices 
of  clinical  periodontal  diseases  and  purported  prognostic  markers  test  for  effects 
across  subjects.  If  associations  exist  within  only  a  subset  of  subjects,  however, 
associations  may  be  masked,  particularly  in  small  studies.  This  issue  was  explored 
by  using  simulation  to  study  four  methods  for  detecting  periodontal  associations. 
Built  into  the  simulations  was  the  possible  biological  reality  that  a  non-zero  associa¬ 
tion  between  the  two  variables  of  interest  (squared  correlation  coefficients,  p:, 
ranged  from  0.1  to  0.9  depending  on  simulation),  measured  at  16  sites  per  subject, 
did  not  exist  in  all  of  10  hypothetical  subjects.  The  four  methods  for  testing  the 
null  hypothesis  that  p  =  0,  or  a  related  hypothesis;  were:  (1)  Sites,  analysis  based  on 
160  sites  incorrectly  considered  independent  observations;  (2)  Subjects,  analysis 
based  on  one  score  for  each  of  10  subjects;  (3)  Each  subject,  separate  analyses  based 
on  sites  within  each  of  10  subjects,  family-wise  type  I  (at)  error  corrected  for  multiplici¬ 
ty.  and  (4)  the  Each  Subject  method  where  P-levels  were  estimated  using  permutation 
procedures  rather  than  t-distributions.  Each  Subject  methods  were  found  to  have 
greater  relative  power  (although  there  are  differences  in  null  hypotheses)  under 
conditions  of  heterogeneity  in  p  and  are  considered  to  be  particularly  relevant  in 
exploratory  periodontal  research  when  the  primary  interest  is  establishing  the 
existence  of  a  relationship,  even  if  in  only  a  subset  of  subjects. 
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Periodontal  research  is  sometimes  direct¬ 
ed  towards  estimating  the  degree  of  rela¬ 
tionship  between  the  primary  clinical 
variable  of  attachment  loss  and  potential 
markers  of  current  or  future  disease  ac¬ 
tivity  such  as  changes  in  gingival  crevicu- 
lar  fluid  (GCF)  contents  and  quantities 
of  various  microorganisms.  Researchers 
frequently  study  multiple  sites  over  time 
in  more  than  one  patient,  and  are  inter¬ 
ested  in  testing  hypotheses  about  popula¬ 
tion  parameters. 

In  two  recent  studies  (I.  2),  for  exam¬ 
ple.  attachment  loss  was  related  to  the 
enzyme  aspartate  aminotransferase 
(AST)  in  GCF.  Data  were  analyzed  at 
many  levels,  including  patients,  teeth 
within  patients,  and  si'.„s  within  teeth, 
but  tests  were  directed  towards  popula¬ 
tion  parameters.  This  is  appropriate  since 
AST.  being  associated  with  cell  death, 
should  have  a  consistent  relationship  to 


periodontal  destruction  in  all  sites  in  all 
subjects. 

In  the  case  of  other  possible  periodon¬ 
tal  predictors,  however,  such  as  the  pres¬ 
ence  of  a  specific  microorganism,  the 
hypothesis  of  a  universal  and  consistent 
association  is  not  as  compelling.  The  bio¬ 
logical  reality  of  the  situation  may  be, 
for  example,  that  attachment  loss  is  re¬ 
lated  to  different  microorganisms  in  dif¬ 
ferent  subjects  at  different  times.  The 
presence  of  such  effect  heterogeneity  is 
not  unusual  in  epidemiological  research 
and  is  typically  compensated  for  by  con¬ 
ducting  studies  with  large  sample  sizes. 
In  research  on  periodontal  microbiota, 
however,  determination  of  "risk  expo¬ 
sure”  and  evaluation  of  “disease  status" 
are  expensive  and  labor  intensive  so  that 
studies  which  are  "large  enough"  may 
not  be  routinely  feasible. 

Given  that  periodontal  studies  of  this 


nature  may  therefore  be  chronically  un¬ 
derpowered  relative  to  rejecting  the  tradi¬ 
tional  null  hypothesis  that  association 
within  the  population  is  zero,  a  single 
subject  strategy  (3)  may  be  useful  for 
exploratory  purposes.  This  strategy  does 
not  address  the  clearly  more  vital  issue 
of  association  in  the  population,  but  may 
provide  preliminary  information  that 
could  both  justify  further  research  and 
make  subsequent  studies  more  efficient. 
The  present  research  evaluates  four  ap¬ 
proaches  to  data  analysis,  including  test¬ 
ing  of  single  subjects,  under  the  biologi¬ 
cal  assumption  that  the  association  be¬ 
tween  a  disease  and  a  selected  marker 
exists  only  in  some  subjects  tested. 

Method 

Simulated  trials  (see  Appendix  A  for  con¬ 
sideration  of  the  analytic  alternative) 
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were  constructed  where  hypothesized  at¬ 
tachment  levels  and  counts  of  a  specific 
microorganism  were  sampled  at  16  sites 
in  each  of  10  patients  on  two  occasions 
separated  by  time.  Interest  is  directed 
towards  the  calculated  correlation  coeffi¬ 
cient  (r)  between  change  in  microorgan¬ 
ism  counts  (X)  and  change  in  attachment 
level  ( Y).  A  biological  reality  constructed 
into  the  simulations  was  that  the  micro¬ 
organism  was  associated  (population 
correlation  coefficient,  p,  is  greater  than 
zero)  in  only  one  through  10  subjects  in 
the  sample.  For  computational  simplicity 
in  the  simulations,  changes  at  the  16  sites 

(i=l . 16)  in  X  and  Y  were  random 

bivariate  standard  normal  deviates.  Four 
approaches  for  testing  statistical  signifi¬ 
cance  were  studied.  In  all  cases,  tests  were 
two-tailed  (critical  r-values  refer  to  abso¬ 
lute  magnitude)  with  type  I  (k)  error  set 
at  0.05. 

Sites  The  160  sites  (16  in  each  of  10 
subjects)  were  incorrectly  considered  as 
independent  observations  and  the  null 
hypothesis  that  p  =  0  was  tested  against 
the  alternative  hypothesis  that  p^O. 
With  /i  =  1 60.  critical  r  =  0.1 5523.  Since 
sites  are  actually  not  independent,  the 
validity  of  this  method  for  making  infer¬ 
ences  to  the  population  is  compromised. 


•  It  >•  J«  «< 

CHANGE  IN  EACTIHIAl  CONCENTEATION  (artlmr7  mm) 

Fig.  I.  To  demonstrate  that  the  correlation 
computed  on  the  basis  of  all  sites  may  differ 
from  that  computed  within  any  or  all  subjects, 
the  figure  shows  hypothetical  data  for  three 
sites  in  10  subjects.  The  correlation  is  -  I  for 
sites  within  every  subject  but  close  to  +  I 
across  all  sites,  although  the  latter  analysis 
is  not  valid  for  inferential  purposes.  If  the 
correlation  was  computed  on  mean  subject 
data  so  that  the  independence  problem  is  elim¬ 
inated.  the  r-valuc  would  be  +  I.  Such  diver¬ 
gence  in  biological  effects  estimated  within 
subjects  versus  across  subjects  would  not  be 
routinely  anticipated  but  nevertheless  high¬ 
lights  that  Sires  analysis  addresses  a  conceptu¬ 
ally  different  association  than  that  of  Subjects 
or  Fnch  Subject,  irrespective  of  the  validity 
issue. 


However,  this  approach  is  unfortunately 
common  in  the  literature  (4)  and  is  in¬ 
cluded  for  comparative  purposes. 

In  addition  to  inferential  problems  as¬ 
sociated  with  site  independence,  this  ap¬ 
proach  makes  evident  a  related  issue  in 
that  the  Sites  r-value  need  not  be  similar 
to  any  r  observed  among  sites  within 
individual  subjects.  For  example,  the 
within  subject  r  between  microbial  con¬ 
centration  and  attachment  loss  may  be 
negative  within  every  subject  but  because 
of  subjects'  data  location  on  a  scale,  posi¬ 
tive  across  all  Sites.  This  is  depicted  in 
Fig.  1.  The  relationship  inherent  in  the 
Sites  r-value  can  be  validly  estimated  by 
computing  a  correlation  between  micro¬ 
bial  concentration  and  attachment,  based 
on  means  (across  sites)  for  each  subject. 
This,  however,  does  not  address  site  spec¬ 
ificity  in  the  same  way  as  when  sites  with¬ 
in  subjects  are  studied. 

Subjects  -  The  r  for  each  of  the  10 
subjects  was  computed  separately  and 
the  null  hypothesis  that  mean  p  =  0  was 
tested  by  a  one-sample  t-test.  With  «  = 
10  (10  by  subject  r-values).  critical  t  = 
2.26216.  This  method  treats  each  sub¬ 
ject's  r  as  an  index  score  representing  a 
summary  characteristic. 

Each  Subject  (parametric)  An  r  was 
computed  for  each  of  the  ten  subjects 
and  each  was  tested  for  statistical  signifi¬ 
cance.  With  n  =  16  (pairs  of  observations 
within  each  subject),  critical  r =0.49731. 
This  significance  test  does  not  allow  for 
inferences  about  the  population  (except 
that  p  =  0  for  every  subject  in  the  popula¬ 
tion)  but  can  be  used  to  identify  whether 
an  effect  exists  in  a  particular  subject. 
(Appendix  B  discusses  another  analysis 
strategy  which  uses  individual  P-values 
to  test  an  overall  null  hypothesis.) 

Since  10  subjects  are  tested  in  each 
trial  the  probability  of  at  least  one  sub¬ 
ject  having  a  significant  r-value  when 
each  is  tested  at  ot  =  0.05.  and  all  nulls  are 
true,  is  not  0.05.  but  rather  1  -0.95"'  = 
0.40126.  This  can  be  conceptualized  as  a 
family-wise  ot-error  rate  associated  with 
the  family-wise  null  hypothesis  that  p  = 
0  for  every  subject  in  the  set. 

Several  methods  have  been  proposed 
to  adjust  P-levels  used  for  significance 
testing  so  that  the  family-wise  a-error 
rate,  rather  than  the  individual  rate,  is 
maintained  at  0.05,  Consider  the  set  of 
m  null  hypotheses  Hm,...,H(ml  and  the 

P-values,  P, . ,P(rai,  corresponding  to 

these  hypotheses,  A  Bonferroni  correc¬ 


tion  would  require  that  P(l)<o</m  for  re¬ 
jection  of  each  H„,.  This  approach  is 
overly  conservative.  More  accurate  pro¬ 
cedures  which  maintain  the  family-wise 
at-error  rate  at  0.05,  but  have  increased 
power  to  detect  effects  when  they  exist  in 
individual  subjects,  have  been  proposed 
by  Holm  (5)  and  by  Hochrerg  (6).  The 
latter  method  will  always  be  as  good  or 
better  than  the  former  (7)  and  was  used 
in  these  simulations. 

Hochberg's  method  requires  that  P- 
values  are  first  ordered  from  smallest  or 
most  significant  (i  =  I)  to  largest  or  least 
significant  (i  =  m)  and  that  the  H,,,s  are 
then  sequentially  tested  in  reverse  order. 
Each  of  the  H(11s  is  retained  so  long  as 
P, , , >  ot / ( m  —  i+  1).  The  first  time  that  the 
inequality  is  reversed,  that  H,„  and  all 
remaining  H(„s  are  rejected.  In  the  pres¬ 
ent  simulation  where  m=  10,  if  the  lar¬ 
gest  P- value.  P,|0I,  is  less  than  0.05  = 
(0.05/(10—10+1).  both  tails)  then  all 
H,„s  are  rejected.  If  this  is  not  the  case, 
then  if  the  second  largest  P-value,  P(l)l  is 
less  than  0.025  =  (0.05/(  10  —  9+  I))  then 
the  remaining  nine  H,,,s  are  rejected,  and 
so  forth.  To  reduce  computations,  critical 
r-values  rather  than  P-values  were  used 
in  the  simulations.  The  critical  absolute 
r-values  (based  on  n=  16)  corresponding 
to  P-values  (in  one-tail)  of  0.02500. 
0.01250.  0.00833,  0.00625,  0.00500. 
0.00417.  0.00357.  0.00313.  0.00278.  and 
0.00250  were  0.49731.  0.55702.  0.58771. 
0.60782.  0.62259.  0.63411.  0.64362. 
0.65145.  0.65832,  and  0.66434.  respec¬ 
tively.  The  family-wise  null  hypothesis 
was  considered  rejected  if  any  of  H(ll  indi¬ 
vidual  null  hypotheses  were  rejected. 

Power  for  the  first  three  test  methodol¬ 
ogies:  Sites.  Subjects.  Each  Subject  (para¬ 
metric),  was  evaluated  by  conducting 
10000  trials  for  each  of  90  conditions 
formed  by  the  factorial  combination  of  9 

levels  of  correlation  (p:  =  0. 1.0.2 . 1.0) 

and  10  levels  of  number  of  subjects  ex¬ 
hibiting  that  association  (I  to  10).  Other 
simulations,  which  are  not  reported,  veri¬ 
fied  that  alpha  error  levels  were  as  ex¬ 
pected. 

The  correctness  of  P-values  used  in 
these  simulations  is  dependent  upon  the 
assumption  (among  others)  that  each 
measurement  represents  an  independent 
observation.  For  Sites  and  Each  Subject 
(parametric)  methodologies,  this  is  not 
the  case.  For  the  latter  method,  the  prob¬ 
lem  of  appropriate  P-values  can  be 
avoided  by  estimating  them  in  terms  of 
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a  random  assignment  model  rather  than 
a  random  sampling  model.  This  can  be 
accomplished  by  using  permutation  tests. 

Each  Subject  (permutation)  -  The  r  is 
first  computed  for  the  subject's  original 
set  of  16  pairs  of  observations.  Data  ar¬ 
ray  location  (i.e..  position  in  the  16 
"slots")  for  one  of  the  variables,  say  Y. 
is  then  permutated  by  a  random  process 
to  Y*.  so  that  pairings  between  x,  and  y* 
themselves  become  random.  Speci' .cally. 
a  new  ordering  of  Y  is  generated  such 
that  y,*  is  randomly  selected  out  of  the 
16  original  scores.  y:*  is  randomly  select¬ 
ed  out  of  the  remaining  15  scores,  and 
so  forth.  This  process  is  repeated  many 
times,  say  9999.  and  an  r  computed  each 
time.  The  P-value  corresponding  to  the 
original  r-value  is  then  represented  by  the 
proportion  of  the  1 0  000  data  sets  where 
r  is  greater  than  or  equal  to  the  original 
r. 

The  permutation  P-value  is  not  depen¬ 
dent  upon  parametric  or  random  sam¬ 
pling  assumptions  and  is  appropriate 
whenever  single  subject  designs  are  uti¬ 
lized.  However,  generation  of  permuta¬ 
tion  statistics  is  computationally  inten¬ 
sive.  This  is  not  a  limitation  in  clinical 
research  where  a  personal  computer  can 
produce  a  P-value  in  a  few  seconds,  but 
does  pose  a  problem  for  a  simulation 
study  where  thousands  of  such  P-values 
are  required.  At  9999  permutations  for 
each  of  10  subjects  for  each  of  90  simula¬ 
tion  sets  of  10000  trials  each,  the  present 


Table  I  Percentage  of  10000  trials  where  a 
null  hypothesis  was  correctly  rejected  (pow¬ 
er*  1(H)).  p-  =  0.l  for  the  number  of  subjects 
indicated.  p'  =  0.0  for  the  remainder  of  the  10 
subjects 


Subjects 

SitesJ 

Subjects 

Each 

subjectb 

1 

704 

5.13 

8.79 

1 

12.90 

7.64 

12.78 

3 

22.38 

1 1 .95 

16.80 

4 

36.01 

18.73 

20.30 

5 

52.10 

29  08 

24.10 

6 

67.22 

4115 

26.93 

7 

80.79 

56.33 

30.05 

X 

90.15 

71.06 

31.91 

9 

95.59 

84.70 

35.51 

10 

98  42 

93.52 

38.8.3 

1  As  noted  in  the  text,  analysis  by  sites  is  not 
valid  Power  is  reported  for  limited  compara¬ 
tive  purposes  and  should  not  be  considered  as 
legitimate 

”  Power  here  refers  to  the  event  that  the  null 
hypothesis  was  rejected  for  at  least  one  subject 


study  would  require  an  additional 
89999  100000  computations  of  r. 

In  order  to  evaluate  how  closely  the 
permutation  P-value  is  approximated  by 
the  parametric  P-value.  1000  trials  were 
simulated  for  each  of  four  of  the  90  con¬ 
ditions  previously  described.  The  four 
conditions  were  for  p:  =  0.3  for  the  case 
where  one,  two,  three,  or  four  subjects 
exhibited  the  relationship.  Selection  ol 
these  conditions  was  done  after  the  initial 
findings  were  produced  so  that  compari¬ 
sons  for  a  reasonable  range  of  power 
levels  could  be  evaluated.  All  procedures 
were  the  same  as  in  the  prior  simulations 
except  that  additionally,  the  P-value  for 
each  subject's  r  was  estimated  on  the  ba¬ 
sis  of  10000  data  permutations  (includ¬ 
ing  the  original  data  set). 

Results 

Tables  I  through  4  show  power  when  I 
to  10  subjects  had  associations  between 
X  and  Y  resulting  from  random  sampling 
of  bivariate  standard  normal  distribu¬ 
tions  with  p;  values  of  0.1.  0.3.  0.5  and 
0.7.  These  values  correspond  to  low. 
moderate,  high,  and  very  high  correla¬ 
tions.  respectively  (p  =  0.316.  0.548. 
0.707.  and  0.837).  Simulations  that  were 
previously  described  for  other  values  of 
p:  are  not  reported  here  since  they  are 
consistent  with  the  trends  established  in 
these  tables. 

Analysis  by  Sites  is  always  more  pow¬ 
erful  than  analysis  by  Subjects  under  the 
conditions  of  these  simulations.  This 
would  seem  to  be  the  motivation  for  the 
appearance  of  site  analysis  in  some  re¬ 
search  where  problems  associated  with 
the  lack  of  independence  have  been  ig- 

Table  2.  Percentage  of  10000  trials  where  a 
null  hypothesis  was  correctly  rejected  (pow¬ 
er  x  100).  p’  =  0.3  for  the  number  of  subjects 
indicated.  p’  =  0.0  for  the  remainder  of  the  10 
subjects 


Subjects 

Sites-1 

Subjects 

Each 

subjecth 

1 

11.24 

4.85 

30.22 

2 

28.60 

9.68 

49.05 

3 

55.23 

1)1.78 

62.54 

4 

79.49 

33.63 

72.55 

5 

93.80 

54.94 

79.92 

6 

98.79 

77.44 

85.67 

7 

99.83 

92.47 

88.95 

8 

100.00 

98.87 

92.24 

9 

100.00 

99.95 

93.98 

10 

100.00 

100(H) 

95.80 

-"See  Table  1. 


Table  3.  Percentage  of  10000  trials  where  a 
null  hypothesis  was  correctly  rejected  (pow¬ 
er  x  100).  p=0.5  for  the  number  of  subjects 
indicated.  p- =0.0  for  the  remainder  of  the  10 
subjects 

Each 


Subjects 

Sites-' 

Subjects 

subject' 

1 

15.00 

4.46 

66.71 

-> 

42.95 

9.99 

89.20 

3 

76.60 

20.71 

95.97 

4 

94.46 

41.42 

98.70 

5 

99.35 

69  19 

99.53 

6 

99.94 

90.51 

99.83 

7 

100.00 

99(H) 

99.97 

8 

100.00 

99.98 

99.96 

9 

100.00 

1 00(H) 

99.98 

to 

100.00 

(00.00 

100.00 

' h  See  Table  I . 


nored.  Analysis  by  Subjects  tended  to  be 
more  powerful  than  analysis  by  Each 
Subject  when  (non-zero)  p  values  were 
low  and  present  in  many  subjects.  Thus, 
analysis  by  Subjects  showed  greater  rela¬ 
tive  power  under  conditions  of  greatest 
homogeneity.  When  there  was  hetero¬ 
geneity  in  associations,  the  Each  Subject 
method  was  more  successful  is  achieving 
statistical  significance.  In  the  extreme, 
when  p:  =  0.7  for  one  or  two  subjects, 
analysis  by  Subjects  had  power  in  the 
vicinity  of  the  alpha  error  level.  Appar¬ 
ently.  power  was  depressed  by  hetero¬ 
geneity  contributing  to  error  variance.  In 
contrast,  analysis  by  Each  Subject  had 
power  exceeding  0.95.  Logically,  this  is 
not  a  remarkable  finding  but  underscores 
the  improvement  in  power  which  is  pos¬ 
sible  when  a  single  subject  approach  is 
adopted  under  these  conditions. 


Table  4.  Percentage  of  10000  trials  where  a 
null  hypothesis  was  correctly  rejected  (pow¬ 
er  x  100).  p’  =  0.7  for  the  number  of  subjects 
indicated.  p;  =  0.0  for  the  remainder  of  the  10 
subjects 

Each 


Subjects 

Sites-' 

Subjects 

subject1’ 

1 

19.11 

3.87 

95.16 

2 

56.04 

8.73 

99.70 

3 

88.08 

22.47 

99.99 

4 

98.66 

47.08 

1(H). (H) 

5 

99.82 

77.56 

100.00 

6 

100.00 

95.74 

100.00 

7 

100.00 

99.88 

1(H). (H) 

8 

100(H) 

100(H) 

100.00 

9 

100.00 

100.00 

lOO.(H) 

10 

100.00 

100  00 

100.00 

See  Table  1 . 
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Permutation  procedures  exhibited 
power  that  was  very  close  to  parametric 
alternatives.  At  p:=0.3  for  one  to  four 
subjects.  Each  Subject  (parametric)  simu¬ 
lations  based  on  1000  trials  found  powers 
of  0.303.  0.485.  0.667.  and  0.745.  respec¬ 
tively.  When  P-levels  were  based  on  per¬ 
mutation  procedures  powers  of  0.308. 
0.487.  0.651,  and  0.734  were  observed. 

Discussion 

The  results  show  that  rejection  of  the 
traditional  null  hypothesis  that  mean  p  = 
0  is  infrequent  in  small  studies  when  the 
effect  of  interest  exists  in  only  a  small 
subset  of  subjects.  This  lack  of  power 
could  be  one  reason  why  the  incorrect 
analysis  of  sites  as  independent  observa¬ 
tions  has  been  common.  However,  both 
approaches  were  less  powerful  than  Each 
Subject  analysis  under  conditions  of 
heterogeneity. 

Analysis  using  Each  Subject  (paramet¬ 
ric  and  permutation)  approaches  how¬ 
ever.  clearly  restricts  inference.  It  does 
not  address  the  null  hypothesis  that  mean 
p  in  the  population  is  zero.  But,  if  it  is 
true  that  periodontal  destruction  is  asso¬ 
ciated  with  different  microorganisms  in 
different  subjects  or  in  the  same  subject 
at  different  points  in  time,  then  appropri¬ 
ate  testing  of  the  traditional  null  hypo¬ 
thesis  may  require  sample  sizes  that  are 
not  routinely  available  (although  meta 
analysis  may  be  useful  in  addressing  this 
problem). 

Single  subject  designs  are  not  common 
in  the  biomedical  literature,  though  there 
are  some  exceptions  (e.g..  8. 9).  It  appears 
that  they  are  most  valuable  when  de¬ 
signed  to  construct  a  patient  specific 
treatment  regime  (8)  or  to  demonstrate 
the  simple  existence  of  rare  phenomena 

(9) .  In  periodontal  research,  findings  of 
statistical  significance  at  the  level  of  indi¬ 
vidual  subjects  may  serve  to  justify  and 
improve  the  efficiency  of  more  definitive 
research. 

The  periodontal  literature  contains 
many  studies  in  w  hich  association  of  dis¬ 
ease  with  purported  microbiological 
markers  has  not  been  confirmed.  Re¬ 
cently.  for  example.  Lisrjartin  ct  at. 

(10)  report  "...  the  results  indicate  that 
the  presence  of  the  above  bacterial  spe¬ 
cies  cannot  of  itself  serve  as  a  reliable 
predictor  of  future  episodes  of  recurrent 
disease  ..."  and  Marnisson  et  al.  (II) 
report  "...  few,  if  any.  of  the  “classical" 


pathogens  were  detected  in  the  plaque 
samples  obtained  at  the  time  progressive 
disease  was  diagnosed.”  Absence  of  ef¬ 
fects  have,  at  times,  been  attributed  to 
methodological  limitations,  justifying  the 
development  and  use  of  more  sensitive 
devices  and  procedures,  and  collection  of 
data  from  larger  samples.  Analyses  on 
individual  subjects  may  be  useful  in 
evaluating  such  recommendations. 

Consider  that  the  across-subjects 
correlation  between  counts  of  a  particu¬ 
lar  microorganism  and  change  in  attach¬ 
ment  level  is  small  and  not  statistically 
significant,  but  that  an  association  has 
been  identified  in  a  few  subjects  using 
procedures  to  control  for  family-wise  i- 
error  level.  At  this  point,  it  is  known 
(at  /><0.05)  that  an  association  exists 
in  some  subjects  in  the  population.  This 
justifies  additional  research  to  estimate 
the  prevalence  of  such  associations  and 
to  develop  procedures  to  identify  subjects 
that  are  of  this  associative  type.  Further 
research  directed  specifically  at  these  sub¬ 
jects  might  be  more  efficient.  In  contrast, 
if  findings  for  individual  subjects  are  also 
negative,  continued  research  seems  less 
justified. 

The  Each  Subject  approach  can  be 
considered  to  lie  between  purely  explor¬ 
atory  versus  confirmatory  data  analysis. 
The  former  is  most  concerned  with 
revealing  patterns  and  features  of  the 
data  and  the  latter  with  the  reproduci¬ 
bility  of  those  patterns  and  features  in 
terms  of  statistical  significance  (12).  In 
exploratory  analysis  correlations  might 
be  examined  and  if  there  is  evidence  of 
heterogeneity,  groups  would  be  created 
based  on  biological  explanations.  How¬ 
ever,  if  the  biological  bases  for  hetero¬ 
geneity  arc  not  well  founded  then  the 
analyst  runs  a  risk  of  creating  an  "effect” 
solely  on  the  basis  of  post  hoc  subject 
.election.  In  confirmatory  analysis,  con¬ 
sistency  of  effect  may  not  explored.  If 
consistency  is  absent,  however,  then  the 
analysis  lacks  power.  The  Each  Subject 
approach  is  exploratory  in  the  sense  that 
it  is  directed  towards  effects  which  are 
heterogeneously  distributed  in  the  popu¬ 
lation  but  confirmatory  in  the  sense  that 
it  controls  for  the  multiplicity  problem 
inherent  in  hypothesis  testing  of  many 
subjects. 

It  can  be  argued  that  the  Each  Subject 
approach,  by  being  potentially  sensitive 
to  an  association  in  only  a  single  subject, 
may  provide  findings  that  are  not  impor¬ 


tant  to  understanding  the  disease  process 
in  general.  This  is  a  valid  and  well  found¬ 
ed  epidemiological  criticism  but  it  is 
more  relevant  to  the  present  progress  and 
direction  of  periodontal  research  than  to 
the  statistical  methodology.  There  may 
be  very  many  periodontal  pathogens  and 
the  biological  basis  may  not  be  available 
for  identifying  one  microorganism  at  a 
site  as  "the"  pathogen  versus  all  other 
microorganisms  at  that  site.  Therefore, 
the  post  hoc  microbiological  grouping  of 
patients  may  be  difficult  to  justify  and 
the  alpha  error  control  inherent  in  the 
Each  Subject  methodology  important. 

It  seems  that  three  analytic  strategies 
may  be  selected  based  on  a  preliminary 
screening  of  the  data.  If  associations  are 
homogeneous  then  a  Subjects  type  anal¬ 
ysis  should  be  used.  If  associations  are 
heterogeneous  but  there  is  a  reasonable 
biological  basis  for  post  hoc  grouping 
then  methods  should  be  used  which  as¬ 
sume  homogeneous  associations  within 
groups  and  heterogeneous  associations 
between  groups.  If  justifications  for 
grouping  are  not  forthcoming  then  the 
Each  Subject  approach  may  provide  a 
valid  test  of  the  hypothesis  that  there  is 
an  association  in  some  subjects. 

In  (unreported)  simulations,  a-error 
using  parametric  P-values  was  at  nomi¬ 
nal  levels.  Therefore,  the  use  of  permuta¬ 
tion  P-levels.  which  in  simulation  was 
shown  to  have  similar  power,  would  not 
appear  to  be  required.  However,  mea¬ 
surements  in  the  simulations  were  con¬ 
structed  to  be  bivariate  standard  normal 
deviates.  In  actual  research  settings  non¬ 
normality  is  common  and.  therefore,  the 
use  of  parametric  P-values  may  be  poten¬ 
tially  unacceptable  and  the  use  of  permu¬ 
tation  P-levels  necessary. 

The  basic  requirement  of  single  subject 
designs  is  that  there  must  be  repeated 
observations  within  a  subject,  either  over 
time  or  space,  that  provide  a  context  for 
the  randomization  (to  groups  or  to 
pairs).  Periodontal  sites  within  mouths 
would  seem  to  offer  numerous  single  sub¬ 
ject  or  site  test  possibilities. 

Although  the  issue  of  analysis  at  the 
level  of  the  individual  site  has  not  been 
addressed,  it  is  clear  that  the  principles 
of  permutation  tests  apply  to  them  as 
well  as  to  individual  subjects.  For  exam¬ 
ple.  an  r  might  be  computed  between  two 
variables  observed  in  the  the  same  site 
on  12  occasions  over  a  three  year  period. 
If  it  is  true  that  different  microorganisms 
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are  pathogenic  at  different  sites,  then  it 
is  logical  to  test  for  associations  separ¬ 
ately  at  different  sites.  The  only  difficulty 
arises  if  one  is  studying  very  many  sites, 
say  16  in  each  of  10  subjects.  With  160 
"experiments"  significant  P-values  may 
have  to  be  as  small  as  0.0003  (0.05/ 160). 
This  problem  might  be  overcome  by  an 
a  priori  selection  of  sites  based  on  risk. 
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Appendix 

A.  Simulation  was  chosen  to  esti¬ 
mate  power  since  it  would  be  suffi¬ 
ciently  accurate  for  these  purposes 
and  could  easily  be  applied  to  all  pro¬ 
cedures  studied.  However,  as  suggest¬ 
ed  by  manuscript  reviewers,  it  would 
seem  that  results  by  analysis  could 
provide  more  mathematically  elegant 
solutions.  Such  analytical  solutions, 
though,  would  be  different  for  each 
methodology  and  there  are  indic¬ 
ations  that  some  of  these  solutions 


would  be  more  difficult  to  achieve 
than  others. 

B.  The  Subjects  methodology  ad¬ 
dresses  the  null  hypothesis  that  mean 
p  across  subjects  is  zero,  while  the 
Each  Subjects  approach  tests  the  null 
that  p  =  0  for  each  subject.  A  third 
alternative,  not  studied  here,  tests, 
through  procedures  where  individual 
P-values  are  combined,  the  null 
hypothesis  that  all  within  subject 
nulls  are  true  versus  the  alternate 
hypothesis  that  the  null  is  false  for  at 
least  one  subject.  This  alternative  is 
required  when  data  from  the  dif¬ 
ferent  "experiments"  (or  subjects) 
cannot  be  reasonably  pooled  (e.g.. 
where  different  parameters  are  being 
studied  in  different  experiments)  so 
that  the  Subjects  methodology  is  pre¬ 
cluded.  The  methodology  may  also 
be  used  as  an  alternative  to  weighting 
when  there  are  unequal  number  of 
sites  per  subject,  since  P-values  will 
incorporate  weighting  automatically. 
It  is  not.  however,  a  reasonable  alter¬ 
native  to  the  Each  Subjects  method¬ 
ology  because  it  does  not  identify  the 
specific  subjects  for  which  the  null  is 
not  true. 

When  the  combined  P-vulue  ap¬ 
proach  is  advisable.  Wistherg  (13) 
notes  that  Fisher's  procedure,  based 
on  the  product  of  P-values.  is  most 
powerful  w  hen  many  individual  nulls 
are  false  to  a  comparable  degree  and 
Tippett's  procedure,  based  on  the 
minimum  P-value  is  most  powerful 
when  one  null  is  false  and  very  devi¬ 
ant.  WtsTBiRCi  also  described  her 
own  "adaptive  method"  that  may 
have  the  advantages  of  both  proce¬ 
dures. 
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was  the  possible  biological  reality  that  a  non-zero  association 
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coef f icients ,  p2,  ranged  from  0.1  to  0.9  depending  on 
simulation) ,  measured  at  16  sites  per  subject,  did  not  exist  in 
all  of  10  hypothetical  subjects.  The  four  methods  for  testing 
the  null  hypothesis  that  p= 0,  or  a  related  hypothesis;  were: 

(1)  Sites,  analysis  based  on  160  sites  incorrectly  considered 
independent  observations;  (2)  Subjects,  analysis  based  on  one 
score  for  each  10  subjects;  (3)  Each  subject,  separate 
analyses  based  on  sites  within  each  of  10  subjects,  family-wise 
type  I  ( or )  error  corrected  for  multiplicity,  and  (4)  the  Each 
Subject  method  where  P-levels  were  estimated  using  permutation 
procedures  rather  than  t-distributions .  Each  Subject  methods 
were  found  to  have  greater  relative  power  (although  there  were 
differences  in  null  hypotheses)  under  conditions  of 
heterogeneity  in  p  and  are  considered  to  be  particularly 
relevant  in  exploratory  periodontal  research  when  the  primary 
interest  is  establishing  the  existence  of  a  relationship,  even 
if  in  only  a  subset  of  subjects. 
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